6 research outputs found

    Global Optimization for Cardinality-constrained Minimum Sum-of-Squares Clustering via Semidefinite Programming

    Full text link
    The minimum sum-of-squares clustering (MSSC), or k-means type clustering, has been recently extended to exploit prior knowledge on the cardinality of each cluster. Such knowledge is used to increase performance as well as solution quality. In this paper, we propose a global optimization approach based on the branch-and-cut technique to solve the cardinality-constrained MSSC. For the lower bound routine, we use the semidefinite programming (SDP) relaxation recently proposed by Rujeerapaiboon et al. [SIAM J. Optim. 29(2), 1211-1239, (2019)]. However, this relaxation can be used in a branch-and-cut method only for small-size instances. Therefore, we derive a new SDP relaxation that scales better with the instance size and the number of clusters. In both cases, we strengthen the bound by adding polyhedral cuts. Benefiting from a tailored branching strategy which enforces pairwise constraints, we reduce the complexity of the problems arising in the children nodes. For the upper bound, instead, we present a local search procedure that exploits the solution of the SDP relaxation solved at each node. Computational results show that the proposed algorithm globally solves, for the first time, real-world instances of size 10 times larger than those solved by state-of-the-art exact methods

    A machine learning approach for forecasting hierarchical time series

    Get PDF
    In this paper, we propose a machine learning approach for forecasting hierarchical time series. When dealing with hierarchical time series, apart from generating accurate forecasts, one needs to select a suitable method for producing reconciled forecasts. Forecast reconciliation is the process of adjusting forecasts to make them coherent across the hierarchy. In literature, coherence is often enforced by using a post-processing technique on the base forecasts produced by suitable time series forecasting methods. On the contrary, our idea is to use a deep neural network to directly produce accurate and reconciled forecasts. We exploit the ability of a deep neural network to extract information capturing the structure of the hierarchy. We impose the reconciliation at training time by minimizing a customized loss function. In many practical applications, besides time series data, hierarchical time series include explanatory variables that are beneficial for increasing the forecasting accuracy. Exploiting this further information, our approach links the relationship between time series features extracted at any level of the hierarchy and the explanatory variables into an end-to-end neural network providing accurate and reconciled point forecasts. The effectiveness of the approach is validated on three real-world datasets, where our method outperforms state-of-the-art competitors in hierarchical forecasting

    An Exact Algorithm for Semi-supervised Minimum Sum-of-Squares Clustering

    Full text link
    The minimum sum-of-squares clustering (MSSC), or k-means type clustering, is traditionally considered an unsupervised learning task. In recent years, the use of background knowledge to improve the cluster quality and promote interpretability of the clustering process has become a hot research topic at the intersection of mathematical optimization and machine learning research. The problem of taking advantage of background information in data clustering is called semi-supervised or constrained clustering. In this paper, we present a branch-and-cut algorithm for semi-supervised MSSC, where background knowledge is incorporated as pairwise must-link and cannot-link constraints. For the lower bound procedure, we solve the semidefinite programming relaxation of the MSSC discrete optimization model, and we use a cutting-plane procedure for strengthening the bound. For the upper bound, instead, by using integer programming tools, we use an adaptation of the k-means algorithm to the constrained case. For the first time, the proposed global optimization algorithm efficiently manages to solve real-world instances up to 800 data points with different combinations of must-link and cannot-link constraints and with a generic number of features. This problem size is about four times larger than the one of the instances solved by state-of-the-art exact algorithms

    SOS-SDP: An Exact Solver for Minimum Sum-of-Squares Clustering

    No full text
    The minimum sum-of-squares clustering problem (MSSC) consists of partitioning n observations into k clusters in order to minimize the sum of squared distances from the points to the centroid of their cluster. In this paper, we propose an exact algorithm for the MSSC problem based on the branch-and-bound technique. The lower bound is computed by using a cutting-plane procedure in which valid inequalities are iteratively added to the Peng–Wei semidefinite programming (SDP) relaxation. The upper bound is computed with the constrained version of k-means in which the initial centroids are extracted from the solution of the SDP relaxation. In the branch-and bound procedure, we incorporate instance-level must-link and cannot-link constraints to express knowledge about which data points should or should not be grouped together. We manage to reduce the size of the problem at each level, preserving the structure of the SDP problem itself. To the best of our knowledge, the obtained results show that the approach allows us to successfully solve, for the first time, real-world instances up to 4,000 data points
    corecore